-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read in parameter PMFs #16
Conversation
0b054f6
to
b637fc9
Compare
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. Thanks for integrating Codecov - We've got you covered ☂️ |
15712d6
to
fa6acaa
Compare
d1920ed
to
986fdc3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good. There's maybe a few things we could do to make it even more strict, but I don't think they're necessary; it's already pretty strict
For generation interval, delays, and right-truncation. It assumes that: - We want a PMF - The PMF is coming from a file with schema specified as in cdcent/cfa-parameter-estimates#9 - The parameter names and disease names are following a specified schema - The file is in parquet format - The PMFs are actually proper PMFs It relaxes the assumptions that: - The PMFs are coming from the same file - The PMFs must be present (can skip delays and right-truncation but not GI) - The files are in Azure or have a specific name/path Unit tests cover successfully reading all the parameters individually as well as in combination. They also check for failure in the expected places for desired failure modes. Switching over to this schema from manual CSVs now is a bit of a choice. I took three cracks at this PR before landing on doing it this way. I think it provides a couple of important benefits to make the switch now and that comes through in designing the code here: 1. Our existing CSV-based approach produces 3 distinct files with close-ish schemas, but not an exact match. It requires distinct reader functions and substantially more code than reading from a file with a unified schema like I do here. 2. I think this approach helps avoid issues like this week's production mishap. It allows us to mix and match files as needed, and we can point to a single drop-in file for testing if desired. We don't need to fiddle with the production environment. 3. I think we're going to need to make a switch on the parameter approach sooner or later. I wanted to bite the bullet and get it over with all at once. It's convenient that it helps make the code simpler, but I think making changes swiftly rather than dragging things out provides its own benefit too.
cff6f42
to
6a785b3
Compare
For additional specificity linking expected errors to tests. I use the regexp option instead of classed errors for `arg_match()` because I had trouble catching those even when matching the class exactly.
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some fussy comments, but big-picture this looks good.
Co-authored-by: Katie Gostic (she/her) <[email protected]>
for more information, see https://pre-commit.ci
But I left state in the documentation for clarity because we don't currently do anything at the sub-state level
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me :)
For generation interval, delays, and right-truncation from a local file.
It assumes that:
https://github.com/cdcent/cfa-parameter-estimates/issues/9
It relaxes the assumptions that:
not GI)
Unit tests cover successfully reading all the parameters individually
as well as in combination. They also check for failure in the expected
places for desired failure modes.
Switching over to this schema from manual CSVs now is a bit of a choice.
I took three cracks at this PR before landing on doing it this way. I
think it provides a couple of important benefits to make the switch now
and that comes through in designing the code here:
close-ish schemas, but not an exact match. It requires distinct
reader functions and substantially more code than reading from a
file with a unified schema like I do here.
mishap. It allows us to mix and match files as needed, and we can
point to a single drop-in file for testing if desired. We don't need
to fiddle with the production environment.
approach sooner or later. I wanted to bite the bullet and get it
over with all at once. It's convenient that it helps make the code
simpler, but I think making changes swiftly rather than dragging
things out provides its own benefit too.